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ABSTRACT 

We  consider  the  problem  of  approximating  the  Hessian  matrix  of  a  smooth 
non-linear  function  using  a  minimum  number  of  gradient  evaluations,  particularly 
in  the  case  that  the  Hessian  has  a  known,  fixed  sparsity  pattern.  We  study  the 
class  of  Direct  Methods  for  this  problem,  and  propose  two  new  ways  of  clas¬ 
sifying  Direct  Methods.  Examples  are  given  that  show  the  relationships  among 
optimal  methods  from  each  class.  The  problem  of  finding  a  non-overlapping 
direct  cover  is  shown  to  be  equivalent  to  a  generalized  graph  coloring  problem 
—  the  distance-2  graph  coloring  problem.  A  theorem  is  proved  showing  that  the 
general  distance- A:  graph  coloring  problem  is  NP-Complete  for  all  fixed  k  >  2, 
and  hence  that  the  optimal  non-overlapping  direct  cover  problem  is  also  NP- 
Complete.  Some  worst-case  bounds  on  the  performance  of  a  simple  coloring 
heuristic  are  given.  An  appendix  proves  a  well  known  folklore  result,  which 
implies  as  a  corollary  that  another  class  of  methods,  the  Elimination  Methods, 
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1.  Introduction 

The  problem  of  interest  is  the  approximation  of  the  Hessian  matrix  of  a 
smooth  nonlinear  function  F  :  R11  -*■  R.  In  many  circumstances,  it  is  difficult 
or  even  impossible  to  evaluate  the  Hessian  of  F  from  its  exact  representation. 
Under  these  conditions,  an  approximation  to  the  Hessian  can  be  computed 
using  finite  differences  of  the  gradient.  When  the  Hessian  is  a  dense  matrix, 
this  approximation  is  usually  obtained  by  differencing  the  gradient  along  the 
coordinate  vectors,  and  hence  requires  n  evaluations  of  the  gradient  (which  is 
the  minimum  possible  number;  see  Appendix  1).  However,  if  the  Hessian  has  a 
fixed  sparsity  pattern  at  every  point  (t.e.,  certain  elements  are  known  to  be  zero), 
the  Hessian  may  be  approximated  with  a  smaller  number  of  gradient  evaluations 
by  differencing  along  specially  selected  sets  of  vectors. 

For  example,  consider  the  following  sparsity  pattern,  in  which  0  stands  for 
a  known  zero  and  1  stands  for  a  possible  non- zero  of  the  Hessian: 

0  0\ 
i  i . 
i  i/ 

If  the  gradient  is  differenced  along  the  directions  (1,1, 0)r  and  (0, 0,  l)r,  the 
Hessian  may  be  approximated  with  only  two  additional  gradient  evaluations. 

When  n  is  large  and  the  proportion  of  zero  elements  is  high,  the  number 
of  gradient  evaluations  needed  to  approximate  the  Hessian  may  be  only  a  small 
fraction  of  n.  This  result  is  particularly  useful  in  numerical  optimization  algo¬ 
rithms. 

Let  g(x)  denote  the  gradient  of  F,  and  H(x)  denote  the  Hessian.  Assume  that 
9(z°)  is  known,  and  that  we  wish  to  approximate  H(x°)  by  evaluating  g(z°-{-hdl), 
l  =  1  ,...,fc,  for  some  step  size  h  and  a  set  of  k  difference  vectors  {d1}.  For 
each  /  and  sufficiently  small  h,  we  obtain  n  approximate  linear  equations 

hH(z°)dl  «  g(z°  +  hdl)  -  g{z°),  (1) 

so  that  there  are  a  total  of  nk  equations.  Note  that  many  of  the  components  of 
dl  and  H(z°)  are  usually  zero. 

Schemes  for  evaluating  a  Hessian  approximation  have  been  divided  into 
three  categories,  depending  on  the  complexity  of  the  subsystem  of  (1)  that 
must  be  solved  for  the  unknown  elements  of  the  Hessian  (see  e.g.,  Coleman  and 
Nlor6  (1981)).  Direct  Methods  correspond  to  a  diagonal  subsystem;  Substitution 
Methods  correspond  to  a  triangular  subsystem;  and  Elimination  Methods  cor¬ 
respond  simply  to  a  nonsingular  subsystem.  There  is  a  tradeoff  here;  as  we  move 
from  Direct  Methods  to  Elimination  Methods,  we  are  less  restricted  and  thus 
expect  fewer  evaluations  to  be  required,  but  we  lose  ease  of  approximation  and 
possibly  numerical  stability. 
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Once  a  class  of  methods  has  been  selected,  the  problem  is  to  choose  a  specific 
method  that  minimizes  k  for  a  given  sparsitj  pattern,  without  requiring  too  much 
effort  to  determine  the  vectors  {  dl }.  This  paper  is  concerned  primarily  with  a 
partial  solution  to  this  problem  for  direct  methods.  (The  solution  for  elimination 
methods  is  well  known,  but  seems  never  to  have  been  published.  Appendix  1 
gives  a  proof  of  the  theorem  behind  the  solution  in  this  case.)  Section  2  gives  a 
new  classification  for  direct  methods,  Section  3  reduces  one  of  these  classes  to  a 
graph  coloring  problem  and  shows  that  problem  to  be  NP-complete,  Section  4 
gives  some  heuristic  results  for  the  same  class,  and  Section  5  points  out  possible 
future  avenues  of  research. 


2.  Classifying  Direct  Methods 

Let  H  denote  the  Hessian  of  F  at  the  point  x°.  Any  element  of  H  that  is 
not  known  to  be  zero  is  called  an  unknown.  An  illuminating  interpretation  of  a 
direct  method  is  to  regard  the  non-zero  components  of  a  given  dl  as  specifying  a 
subset  Si  of  the  columns  of  H]  Si  is  called  the  Ith  group  of  columns;  by  a  slight 
abuse  of  notation,  a  column  index  j  is  said  to  belong  to  Sj  when  its  column  does. 
When  two  columns  belonging  to  5j  both  have  an  unknown  in  row  i,  there  is  said 
to  be  an  overlap  in  Si  in  row  t. 

By  definition  of  a  direct  method,  the  family  of  subsets  {  Sj }  must  satisfy 
the  Direct  Cover  Property  (DC): 

(DC)  For  each  unknown  hy,  there  must  be  at  least  one  5j  containing  column 
j  such  that  column  j  is  the  only  column  in  Sj  that  has  an  unknown  in 
row  i. 

Any  family  of  subsets  of  columns  satisfying  (DC)  is  called  a  direct  cover  for 
H,  and  naturally  gives  a  scheme  for  approximating  H.  That  is,  if  e,-  is  the  Ith 
unit  vector,  differencing  along 

<**=£  'j,  1  =  1,...,* 

jesi 


is  the  scheme  associated  with  the  family  {  Si }.  The  problem  of  interest  is  thus 
that  of  finding  a  minimum  cardinality  direct  cover  for  a  given  H  (an  optimal 
direct  cover). 

Since  it  is  difficult  to  find  a  general  optimal  direct  cover,  the  problem  is 
often  approached  heuristically  by  restricting  the  acceptable  direct  covers  and 
attempting  to  choose  an  optimal  or  near-optimal  direct  cover  from  the  restricted 
set.  We  suggest  a  new  classification  scheme  for  types  of  permitted  direct  covers. 
From  most  to  least  restrictive,  the  categories  are: 

(1)  Non-Overlap  Direct  Covers  (NDC):  No  overlap  may  occur  within  any  group 
of  columns,  i.e.  every  group  has  at  most  one  unknown  in  each  row.  The  best- 
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known  heuristic,  the  CPR  method,  belongs  to  this  class  (see  Curtis,  Powell  and 
Reid  (1974)). 

(2)  Sequentially  Overlapping  Direct  Covers  (SeqDC):  A  less  restricted  class  may 
be  defined  by  observing  that  overlap  within  a  group  of  an  ordered  direct  cover 
does  not  violate  the  direct  cover  property  if  the  values  of  overlapping  unknowns 
can  be  resolved  using  preceding  groups.  In  a  SeqDC,  columns  in  group  l  are 
allowed  to  overlap  in  row  m  only  if  column  m  belongs  to  some  group  k  such  that 
k  <  l.  This  definition  implies  that  (DC)  is  satisfied:  Consider  an  unknown 
and  let  k  =  min{  l  |  either  t  or  j  belongs  to  group  / }  (Jk  is  called  the  minimum 
index  group  of  hij);  note  that  cannot  overlap  with  any  other  unknown  in  its 
row  in  group  l.  Powell  and  Toint  (1979)  propose  a  heuristic  of  this  class. 

(3)  Simultaneously  Overlapping  Direct  Covers  (SimDC):  This  is  the  most  general 
class  of  direct  covers.  Any  kind  of  overlap  is  allowed,  as  long  as  (DC)  is  not 
violated.  In  particular,  an  unknown  may  overlap  in  its  row  even  in  its  minimum 
index  group  as  long  as  the  overlap  is  resolved  in  some  succeeding  group.  Thapa’s 
“New  Direct  Method”  (1980)  falls  in  this  class,  though  he  adds  several  other 
restrictions. 

Note  that  NDC  C  SeqDC  C  SimDC;  these  inclusions  are  strict,  as  the 
following  examples  show.  Consider 

A  1  0  0  1\ 

1110  0 
0  1110. 

0  0  111 
ll  0  0  1  1/ 

Since  every  column  overlaps  with  every  other  column,  an  NDC  must  use  at  least 
five  groups  (and  of  course,  five  suffice).  But  {  {  3  },  {  2, 4  },  { 1 },  {  5  )  }  is  a  SeqDC 
of  cardinality  4  <  5.  Now  consider  (from  Powell  and  Toint  (1979)) 

A  1  1  1  0  0\ 

1110  10 
1110  0  1 
1  0  0  1  0  0' 

0  10  0  10 

Vo  o  i  o  o  1/ 

It  is  easy  to  see  that  any  SeqDC  requires  at  least  four  groups,  but  {  { 1, 5  },  {  2, 6  }, 
{  3, 4  }  }  is  a  SimDC  of  size  only  three. 

The  above  discussion  glossed  over  whether  a  column  can  belong  to  more  than 
one  group.  This  consideration  leads  to  an  independent  classification  scheme  for 
direct  covers  into: 

(1)  Partitioned  Direct  Covers  (prefix  P):  These  require  the  direct  cover  to  be  a 
partition  of  { 1, 2, . . . ,  n  },  t.e.,  every  column  must  belong  to  exactly  one  group. 
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(2)  General  Direct  Covers  (prefix  G):  These  allow  either  columns  that  belong  to 
no  group,  or  columns  that  belong  to  more  than  one  group. 

All  heuristics  proposed  so  far  known  to  this  author  restrict  themselves  to 
partitioned  direct  covers.  When  ha  is  an  unknown,  column  t  must  belong  to 
at  least  one  group,  for  otherwise  ha  would  not  be  determined.  Since  ha  is  an 
unknown  for  all  i  in  most  unconstrained  problems,  it  seems  natural  to  restrict 
our  attention  to  direct  covers  in  which  each  column  belongs  to  at  least  one  group. 
However,  sometimes  the  optimal  direct  cover  is  larger  under  this  restriction.  For 
example,  consider  an  NDC  for 


Since  all  three  pairs  of  columns  overlap,  a  PNDC  must  use  three  groups;  however, 
{{2}, {3}}  is  a  valid  GNDC  of  smaller  size.  But  such  problems  rarely  occur  in 
unconstrained  problems,  and  we  shall  henceforth  consider  only  direct  covers  in 
which  every  column  belongs  to  at  least  one  group. 

From  the  remark  in  the  definition  of  SeqDC  that  the  definition  of  a  SeqDC 
implies  (DC),  it  is  easy  to  see  that  in  any  GSeqDC,  we  can  delete  a  column  from 
all  groups  in  which  it  appears  except  for  its  minimum  index  group,  without 
violating  (DC)  (t.e.,  since  any  hy  is  always  determined  directly  by  some  column 
in  that  column’s  minimum  index  group,  any  later  occurrences  of  that  column  are 
superfluous).  Thus  in  the  SeqDC  case,  and  so  also  in  the  NDC  case,  it  suffices 
to  consider  only  partitioned  direct  covers. 

But,  unfortunately,  PSimDC’s  are  not  optimal  in  the  class  of  GSimDC. 
Consider 


1  0  0  1  1  1  1\ 
0  110  110 
0  1110  0  1 
10  1110  0 
110  110  0 
1  1  0  0  0  1  0 
VI  0  1  0  0  0  1/ 


(2) 


Laborious  calculations  verify  that  any  PSimDC  must  have  more  than  four  groups. 
However,  {  {  1, 2  },  { 1, 3  },  {  4, 6  ),  {  5, 7  }  }  is  a  GSimDC  of  size  four  where  column 
1  appears  in  two  different  groups.  (The  matrix  (2)  is  the  smallest  possible  such 
example  in  terms  of  number  of  columns.) 


3.  An  Equivalent  Graph  Coloring  Problem;  NP-Completeness 

We  show  that  the  problem  of  finding  an  optimal  NDC  (which  can  be  assumed 
to  be  partitioned)  is  equivalent  to  a  graph  coloring  problem,  which  is  then  shown 
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to  be  NP-Complete.  Our  notation  and  terminology  for  graphs  follow  that  of 
Bondy  and  Murty  (1976).  Our  way  of  writing  a  sparsity  pattern  as  a  (0,1)- 
matrix,  call  it  S,  could  just  as  well  be  interpreted  as  the  vertex-vertex  incidence 
matrix  of  an  undirected  graph.  That  is,  the  symmetry  of  the  sparsity  pattern  is 
reflected  in  the  undirectedness  of  the  graph.  A  partition  of  the  columns  of  the 
sparsity  pattern  naturally  induces  a  partition  of  the  vertices  of  the  associated 
graph;  the  vertex  partition  can  be  considered  to  be  some  sort  of  coloring  on  the 
graph. 

Two  columns  in  the  same  group,  t.e.  two  vertices  of  the  same  color,  cannot 
“overlap".  Column  t  overlaps  column  j  if  «*,•  =  =  1  for  some  row  k,  t.e., 

if  vertex  t  and  vertex  j  are  both  adjacent  to  vertex  k.  Thus  the  restriction  on 
our  graph  coloring  is  that  no  two  vertices  of  the  same  color  can  have  a  common 
neighbor.  If  distance  from  vertex  t  to  vertex  j  in  the  graph  is  measured  by 
“minimum  number  of  edges  in  any  path  between  t  and  f ,  then  we  require  that 
any  two  vertices  of  the  same  color  must  be  more  than  two  units  apart.  In  the 
usual  Graph  Coloring  Problem,  we  require  that  any  two  vertices  of  the  same 
color  be  more  than  one  unit  apart.  This  leads  to  defining  a  proper  distance- Jb 
coloring  of  a  graph  G  to  be  a  partition  of  the  vertices  of  G  into  classes  (colors) 
so  that  any  two  vertices  of  the  same  color  are  more  than  k  units  apart.  Then  we 
want  to  solve  the 

Distance- A;  Graph  Coloring  Problem  (D&GCP)  on  a  graph  G:  Find  a  proper 
distance- A:  coloring  of  G  in  the  minimum  possible  number  of  colors. 

Then  the  usual  Graph  Coloring  Problem  (GCP)  is  D1GCP,  and  the  optimal 
NDC  problem  is  equivalent  to  D2GCP.  We  shall  use  this  equivalence  to  show 
that  the  optimal  NDC  problem  is  NP-Complete  by  showing  that  D2GCP  is  NP- 
Complete;  in  fact,  we  shall  show  the  stronger  result  that  DA:GCP  is  NP-Complete 
for  any  fixed  k  >  2. 

First  we  review  the  definition  of  NP-Completeness  (see  Garey  and  Johnson 
(1979)).  The  fundamental  NP-Complete  problem  is  the  Satisfiability  Problem 
(SAT),  which  we  use  in  a  slightly  simpler,  but  equivalent  form: 

3- Satisfiability  (3SAT):  Given  a  set  of  atoms  ui ,u2,...,un,  we  get  the  set  of 
literals  L  =  {  ui,  Hi,  ti2, TI2, . . . ,  un,  Hn  }.  Let  C  —  {  C\,  C2, . . . ,  Cm  }  be  a  set  of 
3-clauses  drawn  from  L,  that  is,  each  Ct  C  L,  and  |C,|  =  3.  Is  there  a  truth 
assignment  T  :  {  ui, . . . ,  un  }  — ►  {  true, false)  such  that  each  C,  Contains  at  least 
one  Uf  with  r(u,)  =  true  or  at  least  one  U,  with  T(u,)  =  false? 

The  set  of  clauses  is  really  an  abstraction  of  a  logical  formula;  imagine  the 
clauses  as  parenthesized  subformulae  whose  literals  are  connected  by  ‘or’,  with 
all  the  clauses  connected  with  ‘and*.  Then  a  satisfying  truth  assignment  makes 
the  whole  formula  true.  3SAT  has  been  shown  to  be  “at  least  as  hard  as”  a  whole 
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class  of  hard  problems.  Thus,  if  3SAT  can  be  encoded  into  any  other  problem  X, 
then  X  inherits  the  “at  least  as  hard  as”  property  and  is  called  NP-Complete. 

In  order  to  encode  3SAT  into  DfcGCP,  DftGCP  must  be  recast  as  a  decision 
problem.  As  is  standard  with  optimization  problems,  we  re-phrase  DJfeGCP  to 
“Is  there  a  distance- A:  coloring  that  uses  p  or  fewer  colors?”  Our  encoding  is 
a  generalization  of  the  one  found  in  Karp’s  original  proof  (1972)  of  the  NP- 
Completeness  of  GCP.  The  theorem  about  the  encoding  requires  the  exclusion 
of  the  case  in  which  a  clause  contains  both  an  atom  and  its  negation.  But  such 
clauses  are  always  trivially  satisfied,  so  we  henceforth  understand  “3SAT"  to 
mean  “3-Satisfiability  without  trivial  clauses” . 

Given  a  3SAT  problem  P,  we  construct  from  it  a  decision  problem  on  a 
graph  Gk(P).  If  P  has  atoms  ui,  u2, . . . ,  un  and  clauses  C\,  C2, . . . ,  Cm,  let  h  = 
'A/2‘1  and  p  =  2 nk  +  m(k  —  1).  Let  V  and  E  denote  the  vertices  and  edges  of 
Gk(P),  and  define  them  by 


V  = 


u»i  1  . _ .  literal  vertices,  false  ver- 

r  =  l,...,A:J  *  ’"*,n  tices,  true  vertices 

r  =  0,  ...,k  — 11  _  clause  vertices,  intermediate 

,,  r=l,...,k  —  1J  vertices 


E  =  { 


(  {“»,**»}  all* 

{F5>^;+1>'| 

{r;,r;+‘} 

m.7;} 

{/'/;+*}  allr 

u)-',  c?> 

ifUjSC, 

ifO.eC.J 


all  r,  all  i  7^  j 


\  all  a 


all  i  7^  j 


{  «(,  T)  } 

r  >  h 

{  C’„F’+ '}  r  >  h 
{c;,rj+1}  r  >  h 
(C'„cr+ ■}  r>0, 

{Cf.Ff}  ui,n,ec.) 
{ c\,T h  all.'  / 


all  «  t,  all  i 


all  s,  k  even 


Ui,  JJ i  different  colors 


all  F’s,  7”s  different  colors 


different  color  than  its 
literals 


Ui,  can  only  be  F}  or 
T\ 


CTi,  r  >  0,  different  from 
each  other'  and  F’s  and 
7”  s 


C®  can  only  be  F1  colors 
of  its  literals 
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Note  that  we  consider  Gk(P)  only  for  k  >  2,  implying  that  h-f-1  <  k,  so  A  -}- 1 
makes  sense  as  a  superscript  for  the  F* s  and  the  T' s.  The  global  structure  of 
Gk{P )  looks  like: 


We  need  three  propositions  about  the  structure  of  a  proper  distance- it  color¬ 
ing  of  Gk{P). 

Proposition  1.  The  vertices  Frit  TJ,  and  Cftf  i  =  l,...,n,  s  =  1, r  = 
1, . . . ,  Jfc  must  all  have  different  colors,  thus  using  up  all  p  colors. 

Proof.  Consider  the  length  k  paths 


n 


which  demonstrate  that  all  F’ s  and  7”s  must  be  different  colors.  Now  consider 
the  length  k  —  1  paths 


C\ 


<  — 


- c}- 


(4) 


which  show  that  no  CJ,  0  <  q  <  h  can  be  any  F\  or  T\  color,  r  >  h.  The 
length  at  most  k  —  1  paths 


ck.-l—c . 


k—2 


fph+i _ r^+2 


show  that  no  CJ,  h  <  q  <  k  can  be  any  FJ  or  T\  color,  r  >  h.  Let  l  be  an 
index  such  that  ui  £  Ct  and  Hi  £  C„  and  consider  the  length  A:  —  1  paths 


C\ - 


T)~ 1 - T\ 


Th 


- T\ 


k  odd 
k  even 


(5) 


which  show  that  no  CJ,  0  <  q  <  h  can  be  any  T\  color,  r  <  h.  The  length  k 
paths 


Cl. 


■C1. 


- F\  k  odd 

• - F J  k  even 
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show  that  no  C f,  0  <  g  <  h  can  be  any  FJ  color,  r  <  A  The  length  A  paths 


C*-1 — Ck~2 

9  ^  a 


j F? - F! 

•  \rj+1 — r* - t* 


show  that  no  Cj,  h  <  7  <  k,  can  be  any  FJ  or  T\  color,  r  <  h.  The  length 
k  —  1  paths 

c,  c2  ...  c»_,  _M— cf-‘  —  c»-» - c\  t»dd 

C‘  C‘  •  \c; —  Cf  — c;-1 - C}  t  even 

(7) 

show  that  no  Cj  can  be  the  same  color  as  any  CJ,  0  <  q,  r  <  h.  Finally,  the 
length  k  —  1  path 


C\— Cl 


■C) — CtH  — cj+l- 


-CJ-1 


shows  that  no  Cj  can  be  the  same  color  as  any  CJ,  h  <  q,  r  <  k.  □ 

Since  FJ,  TJ  and  C\,  q>  0,  use  up  all  the  colors,  we  subsequently  refer  to 
the  colors  by  these  vertex  names. 

Proposition  2.  Vertices  ut-  and  U,-  must  be  colored  F)  and  T*  in  some  order, 
t  =  l,...,n. 

Proof.  Let  j  ^  t  and  consider  the  length  k  paths 


/ 

\Fkrl- 

r  tj 

l  c*-‘ 


.jp*— 1 

-Jr}-’ 
— 1 
U-2 

J  i 

-c.*-2 


— T} 


(9) 


which  show  that  ut-  and  H,-  cannot  be  any  color  other  than  F)  and  T).  Also,  u,- 
and  Hi  certainly  cannot  be  the  same  color.  □ 

Thus  a  proper  distance- A:  coloring  of  Cjt(P)  induces  a  truth  assignment  on 
the  literals. 

Proposition  3.  If  the  literals  in  clause  C,  have  indices  a,  6,  and  c,  then  C\  must 
be  colored  F\,  Fj,  or  Fl  s  —  1,  ...,m. 

Proof.  We  can  add  Cj  to  the  beginning  of  the  paths  in  (4),  (5),  (6),  (7)  and 
(8),  thus  excluding  all  colors  except  F}  from  C°.  If  l  is  an  index  such  that 
uj  g  C„  Hi  6  C„  then  we  can  drop  the  edge  Ff — F*  from  (6)  and  add  C°  to 
the  beginning  to  show  that  C°t  cannot  be  any  Ff  color  either.  □ 
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Now  we  are  ready  to  state  and  prove  the  NP-Completeness  theorem,  which 
then  immediately  implies  that  finding  an  optimal  NDC  is  NP-Complete.  This 
theorem  is  a  symmetric  version  of  a  result  in  Section  3  of  Coleman  and  More 
(1981). 

Theorem  1,  For  fixed  k  >  2,  DJfcGCP  is  NP-Complete. 

Proof.  Since  the  size  of  Gk(P)  is  a  polynomial  in  m  and  n,  it  is  clear  that  the 
above  reduction  of  3SAT  to  D*GCP  can  be  carried  out  in  polynomial  time.  We 
must  show  that  there  is  a  satisfying  truth  assignment  for  the  3SAT  problem  P 
if  and  only  if  the  graph  G*(P)  has  a  proper  distance-*  coloring  in  p  or  fewer 
colors. 

First  suppose  that  Gfc(P)  is  properly  distance-*  colored.  If  la,  and  lc  are 
the  literals  contained  in  Ctl  then  the  length  k  path 


Gg — -f*- 1  —/Jr2 - —U 

shows  that  C®  cannot  be  the  same  color  as  any  of  la,  lb,  or  le.  But  C®  must 
be  colored  F\,  Fj,  or  F\  by  Proposition  3.  By  Proposition  2,  each  /»•  is  colored 
either  Fj  or  T\,  so  each  clause  must  contain  at  least  one  true  literal  under  the 
truth  assignment  induced  by  the  proper  coloring,  t.e.,  the  clauses  are  satisfiable. 

Now  we  need  only  show  that  G*(P)  can  always  be  colored  inp  or  fewer  colors 
if  P  is  satisfiable.  Let  r  be  some  satisfying  truth  assignment  for  C C2> ...» Cm. 
First  color  the  FJ’s,  TJ’s  and  CJ’s,  r  >  0,  as  decreed  by  Proposition  1.  Color 
u»  with  T\  if  r(u,-)  =  true,  color  ut-  with  Fj  otherwise;  color  tJ,-  with  the 
complementary  color.  Each  C$  has  at  least  one  true  literal,  say  la.  Color  C®  with 
color  F* .  Finally,  color  with  CJ_j_ j,  r  =  1  ,...,*  —  1,  where  the  subscript  on 
Crs  is  interpreted  modulo  m. 

We  now  show  that  this  coloring  is  proper.  The  colors  FJ,  TJ,  1  <  r  <  Jb, 
each  appear  on  only  one  vertex  and  so  are  proper.  Color  CJ+1  appears  on  exactly 
two  vertices,  itself  and  I\.  A  shortest  possible  path  between  these  vertices  in 
Gk(P)  is 


/;• 


-Jf-1 - J1 — Ui  — Tk- — CkT } — C*T? - — cr 

•**  *-(-1  ^*-+-1  t 


•+1 


and  is  of  length  k  -j-  1.  This  is  a  shortest  path  because  at  least  k  edges  must 
be  used  to  get  from  layer  to  layer  Cl,  and  one  extra  edge  must  be  used  to 
get  from  an  F  or  a  T  to  a  C.  Also,  any  alternative  path  between  these  vertices 
that  goes  through  a  C®  has  at  least  k  -+•  2*  edges  because  of  the  difference  in 
subscripts,  and  because  the  CJ’s  do  not  interconnect  for  r  <  h)  thus  color  C\  is 
proper.  Color  T J  also  appears  on  exactly  two  vertices,  itself  and  one  of  u<  or  H,-. 
A  shortest  possible  path  between  these  vertices  is  the  third  one  in  (9)  with  Tj 
added  at  the  end.  For  j  t,  at  least  k  edges  must  be  used  to  get  from  the  u. 
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layer  to  the  T\  layer,  and  an  extra  edge  is  necessary  to  go  from  an  i  vertex  to 
a  j  vertex.  Any  other  path  between  these  vertices  through  the  7’s  uses  at  least 
k-\-2h — 1  edges  (see  (3)),  so  T\  is  proper.  Finally,  F\  can  appear  in  three  places: 
on  itself,  on  ut-  or  tr,-,  and  on  any  number  of  C^’s  whose  clauses  contain  neither 
Ui  nor  Of.  As  with  Uf  or  Of  and  T}  above,  «,•  or  Of  and  F*  do  not  cause  a  conflict. 
Some  shortest  possible  paths  between  F J  and  any  Cj  are  those  in  (6)  with  C° 
added  to  the  beginning.  Again,  at  least  k  edges  are  necessary  to  go  from  the  C® 
layer  to  the  F \  layer,  and  an  extra  edge  is  necessary  to  go  from  an  l  vertex  to 
an  i  vertex.  In  (3)  we  see  that  any  other  path  between  these  vertices  through 
the  7’s  uses  at  least  2k  edges,  so  no  C®,  F]  pair  causes  a  conflict.  Between  a  ut- 
or  Of  and  a  C®,  some  shortest  possible  paths  are 


b.J  \r$ — c}-1  — 


— /}->— c; 
—  cj  —  c; 


of  lengths  k  and  k  -+- 1  respectively.  The  first  cannot  exist  because  of  the  truth 
assignment  and  because  there  are  no  trivial  clauses.  Once  again,  the  second  must 
use  k  edges  going  from  layer  u#  to  layer  C®,  and  an  extra  edge  going  from  an 
F  or  a  T  to  a  C,  so  no  u,-  or  Of,  C®  pair  conflicts.  Finally,  a  shortest  possible 
path  between  C®  and  C®  is  (7)  with  C®  added  to  the  beginning  and  C®  added 
to  the  end,  of  length  A:  — f-  1 .  In  (3)  we  see  that  any  other  path  between  these 
vertices  through  the  7’s  uses  at  least  2k  edges,  so  no  F\  color  conflicts.  Thus 
the  coloring  is  proper,  and  the  theorem  is  proved.  □ 


4.  Heuristics  for  Finding  Non-Overlapping  Direct  Covers 

Theorem  1  is,  unfortunately,  a  negative  result,  since  it  implies  that  finding 
an  optimal  NDC  is  very  hard.  On  a  more  positive  note,  much  work  has  been  done 
on  finding  near-optimal,  polynomial-time,  heuristic  algorithms  for  NP-Complete 
problems  (see  Garey  and  Johnson  (1979),  chapter  6). 

In  the  present  case,  the  most  obvious  heuristic  approach  is  to  reduce  D2GCP 
to  GCP  and  then  apply  known  heuristic  results  on  GCP  to  the  reduced  graph. 
Given  a  graph  G  =  (V,  E),  define  D2{G)  (the  distance-2  completion  of  G)  to  be 
the  graph  on  the  same  vertex  set  V,  and  with  edges  E  =  {{*,/}  |  *  and  j  are 
distance  2  or  less  apart  in  G  }.  Then  it  is  easy  to  verify  that  a  coloring  of  V  is  a 
proper  distance-2  coloring  of  G  if  and  only  if  it  is  a  proper  (distance- 1)  coloring 
of  D2[G)  (note  that  this  reduction  also  implies  that  D1GCP  is  NP-Complete). 

If  there  were  a  “good*  heuristic  for  GCP,  then  we  could  compose  it  with 
D2{*)  to  obtain  a  “good”  heuristic  for  D2GCP.  Coleman  and  Mor£  (1981),  Section 
4,  gives  a  good  overview  of  the  present  state  of  the  art  in  GCP  heuristics,  which 
is  not  “good”.  In  fact,  if  cw(G)  denotes  the  number  of  colors  used  by  the  best 
heuristic  on  graph  G,  and  *(G)  denotes  the  optimal  number  of  colors  necessary 
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for  G  (its  chromatic  number),  then  in  the  worst  case 


max 

'Gonn  verticei 


c*(G) 

X(G) 


(10) 


(this  best  heuristic  and  the  bound  (10)  are  due  to  Johnson  (1974)).  Two  facts 
mitigate  the  unpleasantness  of  (10).  First,  the  range  of  does  not  include  all 
graphs,  and  hence  a  better  bound  than  (10)  can  be  obtained  for  D2GCP.  Second, 
average-case  results  have  been  obtained  for  GCP  heuristics  that  are  considerably 
better  than  (10). 

To  improve  on  (10)  for  D2GCP,  consider  the  specific  heuristic  called  the 
distance-2  sequential  algorithm  (D2SA).  Define  J/(i)  =  {/  jtz  i  }  j  is  distance  < 
2  from  t },  the  distance-2  neighborhood  of  a  vertex  i  in  a  graph.  Thus,  if  t  has 
color  c  in  a  proper  distance-2  coloring,  no  j  €  -V(t)  can  be  color  c.  Then  D2SA 
assigns  color 

min{  c  >  1  |  no  j  6  A/(t),  j  <  t,  is  colored  c  } 

to  vertex  t,  t  =  1, . . . ,  |V|.  That  is,  D2SA  assigns  vertex  *  the  smallest  color 
not  conflicting  with  those  already  assigned.  (This  is  just  the  distance-2  version 
of  the  best  known  GCP  heuristic,  the  sequential  algorithm,  which  is  called  the 
CPR  method  in  its  applications  to  approximating  sparse  Jacobians  (see  Curtis, 
Powell  and  Reid  (1974)).)  Let  cs(G)  denote  the  number  of  colors  used  by  D2SA 
when  applied  to  G. 

In  order  to  obtain  bounds  on  cs(G),  we  require  two  definitions.  The  maxi¬ 
mum  degree  of  G,  A(G),  is  defined  as 


A(G)  =  max|{;  |  {*,;}€  E(G) }|. 

t 

The  distance-2  ehromatie  number  of  G,  Xi{G)  is  defined  as  the  optimal  number 
of  colors  in  a  proper  distance-2  coloring  of  G,  t.e. 

X2(G)  =  min{A:  |  Ghas  a  proper  distance-2  coloring  with  k  colors). 

The  following  theorem  bounds  X*{G)  and  cs(G)  in  terms  of  A(G),  and  a 
corollary  improves  (10)  for  D2SA: 

Theorem  2.  Let  d  =  A(G).  Then 

i  +  1  <  X*(G)  <  es(G)  <d2  +  l  (11) 


for  all  graphs  G. 

Proof.  Let  t  be  a  vertex  incident  to  exactly  d  edges,  and  note  that  t  and  its  d 
nearest  neighbors  must  all  be  different  colors  in  a  proper  distance-2  coloring;  this 
proves  the  lower  bound  in  (11).  The  second  inequality  in  (11)  is  trivial. 
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To  prove  the  upper  bound  in  (11),  note  that  for  any  vertex  t,  |J/(*)|  <  d  + 
d(d  —  1)  =  d2.  Suppose  that  D2SA  assigns  color  l  to  vertex  t;  by  definition  of 
D2SA,  this  can  happen  only  if  at  least  one  vertex  of  each  color  1,...,I  —  1  is 
in  //(*).  Thus,  if  *  were  assigned  color  /  >  d2  +  1,  then  |.V(t)|  >  d2  +  1  (a 
contradiction).  (This  proof  is  essentially  a  constructive  proof  of  Corollary  8.2.1 
of  Bondy  and  Murty  (1976).)  D 

Corollary  1.  For  all  n  >  1, 

max  — ^  Vn—  1  -f- 1  =  0(y/n ).  (12) 

G  on  n  ▼•xticei  X2\fi) 

Proof.  Clearly,  cs(G)  <  n.  Let  k  —  xa (<?)•  Applying  the  first  and  third 
inequalities  of  (11),  we  obtain 

cs(G)  <  (Jfe  —  l)a  +  1.  (13) 

Consider  separately  two  cases: 

Case  1:  If  n  <  (Jfe  —  l)2  +  1,  this  implies  that  Vn  —  1  -\-l  <  k  and  so 

gfSfl  <  ”  <  - - - =  VW^l  +  l-2  + - — - <  Vn^T+l. 

*  ^  Vn  — 1  +  1  >/n— T-f-1 

Case  2:  If  n  >  (Jfe  —  l)2  ■+-  1,  then  k  <  \/n  —  1  +  1,  and  so 

cSiGl  <  (*  ~  *)*  +  1  =  *_2  +  i<Jt<  Vn  —  1  +  1.  □ 

Jfe  “  Jfe  Jfe  _ 

Graphs  that  attain  bound  (13)  for  a  certain  ordering  of  their  vertices  exist  for 
Jfe  =  1,2, 3,4.  The  cases  Jfe  =  1,2  are  trivial.  For  k  —  3,  consider  G3  =  (VitEa) 
defined  by 

V3  =  tij  *  =  1,2,3,  ;  =  1,2, 3, 4, 5, 

£3  =  {  Zij,  xi+i)i+ 1  >  all  t, ;  (subscripts  modulo  3  and  5). 


Then  D2SA  assigns  i,y  color  t  when  the  vertices  are  ordered  by  *  (which  is 
optimal  by  (11)),  and  assigns  color  ;  when  the  vertices  are  ordered  by  j 
(which  is  the  worst  possible,  by  (13)).  For  k  =  4,  consider  G4  =  .{V^,Ei)  defined 


by 

•«* 

H 

II 

£ 

j  i  —  1, 2, 3, 4, 

;  =  1, . . . ,  10, 

^  {  * ijt  2,i+5  } 

all;  ) 

(subscripts  modulo  4  and  10). 

all  odd  ;  >  all  i 

l  {*y»*f+u+ 4} 

all  even  ;  J 
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Then  D2SA  applied  to  G+  also  colors  tq  with  i  when  ordered  bj  t,  and  with 
j  when  ordered  by  j  (which  are  again  respectively  optimal  and  worst  possible). 
However,  this  construction  seems  difficult  to  extend.  Even  if  it  can  be  extended, 
the  number  of  vertices  is  given  by  n  =  —  l)2  -j-  l)  so  that 

=  0(n>/3) 

which  is  a  better  result  than  (12).  Thus,  while  (11),  (12)  and  (13)  are  better 
results  than  (10),  I  believe  that  the  associated  bounds  are  not  the  best  possible. 

For  the  average  case,  Grimmet  and  McDiarmid  (1975)  proved  the  following 
theorem: 

Theorem  3.  Fix  n  vertices,  and  let  vertices  i  and  j  be  independently  connected 
by  an  edge  with  fixed  probability  p,  0  <  p  <  1.  Let  cc(G)  be  the  number  of 
colors  used  by  CPR  on  G,  and  x(G)  be  the  optimal  number  of  colors  (so  that 
cc[G)  and  x{G)  are  random  variables).  Then 


cC(G)  < 

X(G)  ~ 


2-M 


for  all  e  >  0  with  probability  1  —  o(l). 

Thus,  on  average,  CPR  almost  never  performs  more  than  twice  as  badly  as 
the  optimal  strategy.  This  is  a  nice  result,  but  for  our  purposes  it  has  at  least  two 
flaws.  First,  sparsity  patterns  in  practical  problems  are  not  uniformly  random  as 
Theorem  3  supposes.  Second,  even  if  they  were,  the  density  of  sparsity  patterns 
tends  to  be  0(l/n)  rather  than  constant  with  increasing  n,  so  the  theorem  does 
not  apply  anyway.  It  would  be  useful  to  determine  a  better  random  model  for 
sparsity  patterns,  or  at  least  to  prove  Theorem  3  under  the  assumption  that 
p  —  0(1  /n). 


5.  Conclusions  and  Further  Questions 

As  we  move  from  Direct  Methods  to  Elimination  Methods,  and  from  NDCs 
to  GSimDCs  within  Direct  Methods,  we  are  less  restricted,  and  so  can  find 
potentially  more  powerful  methods.  We  also  move  from  an  NP-Coxnplete  problem 
(finding  an  optimal  NDC)  to  a  polynomially- bounded  one  (a  general  Elimination 
Method)  (using  Theorem  1,  and  Theorem  4  of  the  Appendix).  It  would  be 
interesting  to  know  what  intermediate  point  divides  NP-Complete  methods  from 
polynomial  methods  (if  indeed  there  is  a  continuum  at  all). 

In  particular,  it  is  usually  easy  to  remove  some  restrictions  so  as  to  change 
an  NDC  method  into  a  SeqDC  method  (see  Powell  and  Toint  (1979)),  and  to  see 
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what  properties  a  graph  coloring  must  hare  to  be  a  SeqDC.  Can  this  be  used 
to  prove  a  version  of  Theorem  1  for  SeqDCs?  SimDCs  are  harder  to  deal  with 
because  of  their  simultaneous  nature,  but  they  can  still  be  shown  to  be  equivalent 
to  a  form  of  graph  "multi-coloring” .  Is  the  corresponding  Theorem  1  still  true? 
The  expectation  is  that  these  graph  coloring  problems  are  also  NP-Complete, 
which,  if  true,  means  that  we  must  rely  on  heuristic  algorithms  to  construct 
SeqDCs  and  SimDCs.  Can  the  bound  (12)  be  significantly  improved,  or,  more 
importantly,  can  a  provably  better  heuristic  be  found? 

Other  interesting  questions  involve  the  observed  performance  of  heuristics. 
Even  CPR,  one  of  the  simplest  possible  heuristics,  seems  to  give  good  results  on 
practical  problems  (see  Coleman  and  Mor6  (1981),  tables  3,  4  and  5).  Can  this 
behavior  be  proved  under  some  convincing  randomness  assumption,  as  suggested 
at  the  end  of  Section  4?  Much  work  remains  to  be  done  before  we  know  whether 
we  are  approximating  sparse  Hessians  efficiently. 
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Appendix  1.  Elimination  Methods 

Assume  that  we  approximate  the  Hessian  of  F  :  Rw  -*•  R  using  an  elimina¬ 
tion  method  by  evaluating  the  gradient  of  F  at  x°  along  directions  dl,  d2, . . . ,  dk. 
If  no  entry  of  the  Hessian  is  known  to  be  zero,  there  are  n(n  -f- 1)/2  unknowns, 
namely  hy  for  all  1  <  t  <  j  <  n.  The  following  theorem  is  well  known  in  the 
folklore,  but  the  present  author  knows  of  no  published  proof: 

Theorem  4 .  The  maximum  number  of  unknowns  that  can  be  determined  from 
a  set  of  gradient  evaluations  along  any  k  directions,  0  <  k  <  n,  is  given  by 

r..>  =  n  +  (i»-l)  +  -  +  (»-t  +  l)  =  *(2,,~*  +  1). 

In  particular,  in  the  completely  dense  case,  n  evaluations  are  necessary  to  obtain 
all  n(n  -f- 1)/2  unknowns. 

Proof.  Let  g{ x)  denote  the  gradient  of  F  at  x,  and  H{z)  denote  the  Hessian  of 
F  at  x.  Evaluating  g(x)  along  the  k  directions  dl ,<P, . . .  ,dk  produces  the  nk 
approximate  linear  equations 

(<f)T#(*°) «  j(z°  +  <T> - f(x“),  <  =  (14) 


We  assume  that  H  is  symmetric,  and  so  identify  unknowns  /i,y  and  hji  in  (14). 
The  number  of  unknowns  that  can  be  determined  from  equations  (14)  is  bounded 
above  by  the  rank  of  the  nk  by  n(n  +  l)/2  coefficient  matrix  of  the  hy* s  when 
no  hij  is  assumed  to  be  zero.  Questions  of  rank  could  be  affected  by  dependence 
among  the  d*;  we  thus  assume  the  Haar  Condition,  namely  that  every  maximal 
square  submatrix  of  the  matrix  whose  column  is  <T  is  non-singular. 

To  describe  the  coefficient  matrix,  order  the  equations  in  (14)  so  that  equa¬ 
tions  with  left-hand  side  (d*)TH«fi,  i  =  1, . . . ,  k  appear  first,  those  with  left-hand 
side  (di)TH,i2,  t  —  1, . . .,  k  appear  next,  etc.,  and  then  order  the  unknowns  as 
Hu,  h2i,  h22,h3i,  h32, ....  hnn-  Given  this  ordering,  call  the  coefficient  matrix  in 
(14)  Ak ;  partition  Ak  row-wise  into  n  blocks  of  k  rows  each,  and  column-wise 
into  n  blocks,  where  the  Ith  column  block  has  i  columns.  Let  the  t,  submatrix 
of  the  partition  be  denoted  by  Akj,  i,j  =  1, . . . ,  n.  Define  c3  as  the  k-vector  of 
the  j th  components  of  the  {  d * },  t.e.,  c3  =  (dy,  dj, . . . ,  dk)T,  j  =  1, . . . ,  n.  Each 
Ak  j  is  completely  described  by 

r  o,  if*  >  j; 

Ak  _J 

,J'  |  (0,0 . d . 0),  iff  <  >. 

{  l  2  «  ; 


.  • 


* 
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For  example,  when  n  —  4,  Ak  has  the  form 
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hn  ^22  ^31  /»32  ^33  &41  ^43  A43  /I44 


c 1 

c2 

c8 

c4  ^ 

c1  c2 

c8 

c4 

e1  c2  c8 

c4 

c1  c2  c8  c4 

/ 

where  zero  elements  are  not  shown. 

To  complete  the  proof,  it  must  be  shown  that  rank(A*)  =  =  (n  -f-(n  — 

1)  - K  2  +  1)  —  ((n  —  Jfc)  +  (»  —  Jfc  —  1)  H - f-  2  +  1).  We  show  first 

that  the  last  n  —  Jfc  columns  of  column  block  n,  the  last  (n  —  k  —  1)  columns 
of  column  block  n  —  1,  . . . ,  and  the  last  column  of  column  block  k  -j-  1  can  be 
eliminated  using  linear  combinations  of  the  remaining  columns.  Note  that  the 
complementary  set  of  columns,  denoted  by  Ct  is  precisely  the  set  of  columns 
whose  non-zero  entry  of  largest  row  index  is  cj  for  some  j  <  Jfc. 

Define  X*  to  be  the  solution  of  the  system 

(c1  c2  •••  e*)X*  =  —cl,  1  =  k  +  l,k  +  2,...,n 

(X1  must  be  unique  under  the  assumption  of  the  Haar  condition).  The  following 
computations  show  that  linear  combinations  of  the  columns  in  C,  using  the  \l,t 
as  multipliers,  can  be  used  to  eliminate  the  columns  mentioned  above;  since  the 
form  of  the  linear  combinations  is  complicated,  the  result  is  best  understood  by 
referring  to  example  (15)  and  assuming  that  k  —  2. 

To  eliminate  the  last  column  of  column  block  n  from  Ak,  we  add  to  it  X? 
times  column  j  of  column  block  n,  j  —  1, . . . ,  Jfc,  and  X{^X£  times  column  j  of 
column  block  to,  m  =  l,...,Jfc,  j  =  1, In  row  block  n,  the  new  last 
column  is  cn  +  X*c*,  which  is  zero  by  definition  of  the  X".  (all  terms  from 

the  first  column  block).  In  row  blocks  Jfc  <  i  <  n,  there  is  no  contribution  from 
any  column  block.  In  row  blocks  t  <  Jfc,  the  sum  includes  X?cn  from  column 
block  n,  zero  from  column  block  m  when  Jfc  <  to  <  n,  X?X£e*  from  column  block 
Jfc,  X?X£_ jC*-1  from  column  block  Jfc— 1, ....  X£(X*c*4'X£_iC*71-f  •••-f-XJc1) 
from  column  block  t,  and  zero  from  column  blocks  to  when  to  <  *,  for  a  total  of 

X?(cn  +  (XjC1  -f-  XJc2 -| - 1- X Jc*)),  which  is  again  zero.  Thus  the  last  column 

of  column  block  n  is  dependent  on  the  columns  in  C,  and  by  symmetry  so  is  the 
last  column  of  each  column  block  Jfc  + 1,  Jfc  +  2, —  1. 

Now  we  eliminate  column  n  —  1  of  column  block  n  using  the  columns  in  C. 
Add  to  it  X"-1  times  column  j  of  column  block  n,  /  =  1, . . .,  k,  X*  times  column 
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3  of  column  block  n  —  1,  j  —  1»  .  .  . ,  k,  and  (X|^X"— 1  -f-  XJ^~1X")  times  column  j 
of  column  block  m,  tn  =  1, . . .,  k,  j  =  1, . . . ,  m.  In  row  block  n  the  new  column 
n  —  1  of  column  block  n  is  cn_l  X*~  lcJ,  which  is  zero  by  the  definition 

of  the  X£_1  (all  terms  from  column  block  n).  In  row  block  n— 1,  the  new  column 

is  c*  +  £*=1  X*c*,  which  is  again  zero  (the  first  term  from  column  block  n,  the 
rest  from  column  block  n  —  1).  In  row  block  k  <  t  <  n  —  1,  no  contribution 


is  made  by  any  column  block.  In  row  block  *  <  Jt,  the  sum  includes  X*—len 
from  column  block  n,  X"cn— 1  from  column  block  n  —  1,  zero  from  column  block 
m  when  k  <  m  <  n  —  1,  (X^X?”"1  -+-  XJ  1X?’)c*  from  column  block  Jfc, 
X?(X?-1c*  +  X?T11c‘'-1  +  .-.  +  Xr1c1)  +  X?-1(X?c‘‘-fX5Llc*-1  +  ..-fX?e1) 
from  column  block  *,  and  zero  from  column  block  m  when  m  <  t,  for  a  total  of 

X?(cn— 1  +  (XJT lck  +  •  •  *  +  XJ-V))  +  +  (Xfcfc  H - 1-  XJc1))  =  0. 

Thus  column  n — 1  of  column  block  n  is  also  dependent  on  the  columns  in  C,  and 


again  by  symmetry,  so  are  all  columns  j  in  column  blocks  m  with  k  <  3'  <  m. 

By  eliminating  1 +2-f - {-  (t* — Jfc)  columns  we  have  shown  that  rank(A*)  < 

r»,*.  To  show  that  rank(A*)  =  rn  k,  delete  the  columns  that  were  eliminated 
above  from  Ak,  arid  delete  the  last  k  —  t  rows  from  each  row  block  *,  t  <  k. 


Then  the  remaining  matrix  is  rntk  by  rn,*  and  is  block  upper  triangular  with 
square,  non-singular  diagonal  blocks.  Thus  this  submatrix  of  Ak  is  non-singular, 
so  rank(A*)  >  rn>k,  and  the  theorem  is  proved.  □ 


A  corollary  to  this  proof  is  that  the  minimum  number  of  gradient  evaluations 
necessary  to  approximate  a  Hessian  (sparse  or  dense)  can  be  found  in  time 
bounded  by  0(n7)  by  the  following  procedure: 

1.  Set  Jfc  =  1. 

2.  Form  Ak,  deleting  the  columns  corresponding  to  variables  known  to  be 
zero. 

3.  Evaluate  rank(Ak)  =  t*,  say.  If  t*  >  number  of  unknowns,  then  k  is 
optimal;  otherwise,  set  k  =  k  +  1  and  go  to  (2). 

Step  3  can  be  performed  at  most  n  times,  on  a  matrix  whose  number  of 
columns  is  0(n2).  Evaluation  of  rank  requires  0(|columns|s)  operations,  giving 
a  total  bound  of  0(n7).  In  practice,  Theorem  4  can  be  used  to  get  reasonable 
bounds  on  k,  making  the  work  more  like  0(n6).  However,  numerical  difficulties 
may  make  a  general  elimination  method  untrustworthy  in  any  case. 
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